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Sequential Sensing with Model Mismatch 

Ruiyang Song, Yao Xie, and Sebastian Pokutta 


Abstract —We characterize the performance of sequential infor¬ 
mation guided sensing, Info-Greedy Sensing QJ, when there is a 
mismatch between the true signal model and the assumed model, 
which may be a sample estimate. In particular, we consider a setup 
where the signal is low-rank Gaussian and the measurements 
are taken in the directions of eigenvectors of the covariance 
matrix S in a decreasing order of eigenvalues. We establish a set 
of performance bounds when a mismatched covariance matrix 
E is used, in terms of the gap of signal posterior entropy, as 
well as the additional amount of power required to achieve the 
same signal recovery precision. Based on this, we further study 
how to choose an initialization for Info-Greedy Sensing using 
the sample covariance matrix, or using an efficient covariance 
sketching scheme. 

Keywords—compressed sensing, information theory, sequential 
methods, high-dimensional statistics, sketching algorithms 


I. Introduction 

Sequential compressed sensing is a promising new informa¬ 
tion acquisition and recovery technique to process big data 
that arise in various applications such as compressive imaging 
power network monitoring §, and large scale sensor 
networks |6j. The sequential nature of the problems arises either 
because the measurements are taken one after another, or due 
to the fact that the data is obtained in a streaming fashion so 
that it has to be processed in one pass. 

To harvest the benefits of adaptivity in sequential compressed 
sensing, various algorithms have been developed (see fl]] for 
a review.) We may classify these algorithms as (1) being 
agnostic about the signal distribution and, hence, using random 
measurements 0-® (2) exploiting additional structure of 
the signal (such as graphical structure 03 and tree-sparse 
structure to design measurements; (3) exploiting 

the distributional information of the signal in choosing the 
measurements possibly through maximizing mutual information: 
the seminal Bayesian compressive sensing work [[17]. Gaussian 
mixture models (GMM) (18), JT9) and our earlier work |jTj| 
which presents a general framework for information guided 
sensing referred to as Info-Greedy Sensing. 

In this paper we consider the setup of Info-Greedy Sensing 0, 
as it provides certain optimality guarantees. Info-Greedy Sensing 
aims at designing subsequent measurements to maximize the 
mutual information conditioned on previous measurements. 
Conditional mutual information is a natural metric here, as 
it captures exclusively useful new information between the 
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signal and the result of the measurement disregarding noise and 
what has already been learned from previous measurements. 
It was shown in 0 that Info-Greedy Sensing for a Gaussian 
signal is equivalent to choosing the sequential measurement 
vectors ai,a 2 ,... as the orthonormal eigenvectors of E in a 
decreasing order of eigenvalues. 

In practice, we do not know the signal covariance matrix E 
and have to use a sample covariance matrix E as an estimate. 
As a consequence, the measurement vectors are calculated from 
E, which deviate from the optimal directions. Since we almost 
always have to use some estimate for the signal covariance, it 
is important to quantify the performance of sensing algorithms 
with model mismatch. 

In this paper, we characterize the performance of Info-Greedy 
Sensing for Gaussian signals (1) when the true signal covariance 
matrix is replaced with a proxy, which may be an estimate from 
direct samples or using a covariance sketching scheme. We 
establish a set of theoretical results including (1) relating the 
error in the covariance matrix ||E — E|| to the entropy of the 
signal posterior distribution after each sequential measurement, 
and thus characterizing the gap between this entropy and 
the entropy when the correct covariance matrix is used; (2) 
establishing an upper bound on the amount of additional power 
required to achieve the same precision of the recovered signal 
if using an estimated covariance matrix; (3) if initializing Info- 
Greedy Sensing via a sample covariance matrix, finding the 
minimum number of samples required so that using such an 
initialization can achieve good performance; (4) presenting a 
covariance sketching scheme to initialize Info-Greedy Sensing 
and find the conditions so that using such an initialization is 
sufficient. We also present a numerical example to demonstrate 
the good performance of Info-Greedy Sensing compared to a 
batch method (where measurements are not adaptive) when 
there is mismatch. 

Our notations are standard. Denote [n] = {1,2,..., n}; ||X| 
is the spectral norm of a matrix X, ||X||^ denotes the Frobenius 
norm of a matrix A', and ||X||* represents the nuclear norm of 
a matrix A; ||a;|| is the £ 2 norm of a vector x, and ||:r||i is the 
£1 norm of a vector x\ let \n be the quantile function of the 
chi-squared distribution with n degrees of freedom; let E[a;] and 
Var[cc] denote the mean and the variance of a random variable 
x\ X y 0 means that the matrix X is positive semi-definite. 

II. Problem setup 

A typical sequential compressed sensing setup is as follows. 
Let x £ R" be an unknown n -dimensional signal. We make I\ 
measurements of x sequentially 

y k = alx + Wk, k=l,...,K, 

and the power of the measurement is 11 a a, 11 2 = /3fc. The goal is to 
recover x using measurements {yk}k=i- Consider a Gaussian 
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Fig. 1 : Parameter update in the algorithm and for the true distribution. 


signal x ~ Af(0, E) with known zero mean and covariance 
matrix E (here without loss of generality we have assumed the 
signal has zero mean). Assume the rank of E is s and the signal 
can be low-rank s«n. Info-Greedy Sensing [I] chooses each 
measurement to maximizes the conditional mutual information 

ak 4— argmaxl [a:; a T x + w\ yj, aj,j < k\ /(3k- 

a 

The goal is to use minimum number of measurements (or total 
power) so that the estimated signal is recovered with precision 
e: ||x — at|| < er with high probabilities. 

InQ. we have devised a solution to the above problem, and 
established that Info-Greedy Sensing for low-rank Gaussian 
signal is to measure in the directions of the eigenvectors of 
E in a decreasing order of eigenvalues with power allocation 
depending on the noise variance, signal recovery precision e 
and confidence level p, as given in Algorithm [T] 

Ideally, if we know the true signal covariance we will use 
the corresponding eigenvector to form measurements. However, 
in practice, we have to use an estimate of the covariance matrix 
which usually has errors. To establish performance bound when 
there is a mismatch between the assumed and the true covariance 
matrix, we adopt a metric which is the posterior entropy of 
the signal conditioned on previous measurement outcomes. The 
entropy of a Gaussian signal x ~ A f{p, E) is given by 

H [a;] = ^ In ((27re) n det(E)). 


Algorithm 1 Info-Greedy Sensing for Gaussian signals 

Require: assumed signal mean 9 and covariance matrix T, 
noise variance a 2 , recovery accuracy e, confidence level p 

l: repeat 

2: A ll r ll 

3 : (3 <- ( xl(p)/e 2 - 1/A)a 2 

{largest eigenvalue} 

4: uf- normalized eigenvector of T for eigenvalue A 

5: form measurement: a = \fj3u 

6 : measure: y = a T x + w 

7: update mean: 9 f— 9 + Ta(y — a T 9 )/(A + a 2 ) 

8: update covariance: T f— T — Taa T r/(A + er 2 ) 

9: until ||r|| < e 2 /Xn(P ) {all eigenvalues become small} 

10: return posterior mean 9 as a signal estimate x 


E with the eigenvalues (which can be zero) ranked from the 
largest to the smallest to be (Ai, ui), (A 2 , 712 ), • ■ ■, (A„, u n ), 
and let the eigenpairs of E with the eigenvalues (which 
can be zero) ranked from the largest to the smallest to be 
(Ai, tii), (A 2 , U 2 ), • • • jJ,A„, u n ). Let the updated covariance 
matrix in Algorithm 111 starting from E after k measurements 
using {cii}i =1 be Efc, and the true conditional covariance matrix 
of the signal after these measurements be Efc. The evolution of 
the covariance matrices in Algorithm [I] is illustrated in Fig. [T] 
Hence, by this notation, since each time we measure in direction 
of the dominating eigenvector of the updated covariance matrix, 
we have that (A k,Uk) is the largest eigenpair of Efc-i, and that 
(Afc,rifc) is the largest eigenpair of E* ; _ 1 . Furthermore, denote 
the difference between the true and the assumed conditional 
covariance matrices after we obtain k measurements 

Ffc E fc Efc, 


Hence, the conditional mutual information is essentially the 
log of the determinant of the conditional covariance matrix, 
or equivalently the log of the volume of the ellipsoid defined 
by the covariance matrix. Here, to accommodate the scenario 
where the covariance matrix is low-rank, we consider a modified 
definition for conditional entropy, which is the log of the volume 
of the ellipsoid on the low-dimensional space. Let Efc be the 
underlying true signal covariance conditioned on the previous 
k measurements; denote by E/. the observed covariance matrix, 
which is also the output of the sequential algorithm. Assume 
the rank of E is s. Then the metric we use to track the progress 
of our algorithm is 

H[x\yj,aj,j < k] = ln((27re) s/2 Vol(E fe )), 

where Vol(Efc) is the volume of the ellipse defined by the 
covariance matrix Efc, which is equal to the product of its 
non-zero eigenvalues. 

III. Performance bounds 

We analyze the performance of Info-Greedy Sensing, when 
the assumed covariance matrix is used for measurement design, 
E, which is different from the true simal covariance matrix E, 
i.e. E is used to initialize Algorithm 111 Let the eigenpairs of 


and let 

Sk = \\Ek\\- 

Assume the eigenvalues of are e-| > e .2 > ■ ■ ■ > e n . Then 
5 k = max{|ei|, |e„|}. 


A. Deterministic error 


The following theorem shows that when the error, ||E — E| 
is sufficiently small, the performance of Info-Greedy Sensing 
will not degrade much. Note that, however, if the power 
allocations /3i are calculated using the eigenvalues of the 
assumed covariance matrix E, after K = s iterations, we do 
not necessarily reach the desired precision £ with probability p. 


Theorem 1 . Assume the power allocations /3k = (% 2 ( p )/s 2 — 
1/Afc)cr 2 are calculated using eigenvalues A k of E, the noise 
variance er 2 , recovery accuracy £ and confidence level p in 
AlgoritlimUj Given the rank of the covariance matrix rank(Yf) = 
s, the number of total measurements is K, for some constant 
0 < £ < 1, if the error satisfies 


l|£ 


s|| < 


C £ 2 

4 k+1 X 2 u(pY 


2 













then 


. [x | yj,dj,j < k] < — ^ ln[2-7retr(E)] — E 1 ^ 1 ^) (> 

i=i 


( 1 ) 


where 


fk = l- 1 C Q f Afc €(0,1), k = l,...,K. 
s (3 k \ k + <T 2 


In the proof of Theorem [T] we use the trace of the underlying 
actual covariance matrix tr(Efc) as potential function, which 
serves as a surrogate for the product of eigenvalues that 
determines the entropy, since the calculation of the trace of the 
observed covariance matrix tr(Efc) is much easier. Note that 
for an assumed covariance matrix E, after measuring in the 
direction of a unit norm eigenvector u with eigenvalue A using 
power /3, the updated matrix takes the form of 


E - Ei/Su 

Act 2 t „ 


u + cr 


-l 


y/PvJY, 


( 2 ) 


/? A + a 2 


where E 2 -" 1 is the component of E in the orthogonal complement 
of a. Thus, the only change in the eigen-decomposition of E 
is the update of the eigenvalue of a from A to A cr 2 /(/3A + cr 2 ). 
Based on the update above in Q, after one measurement, the 
trace of the covariance matrix that the algorithm keeps track of 
becomes 


tr(E fc )=tr(E fc _ 1 )- f 

P/cAfc + a 2 


Remark 1. The upper bound of the posterior signal entropy 
in 0 shows that the amount of uncertainty reduction by the 
kth measurement is roughly (s/2) ln(l//fc). 


Remark 2. Use the inequality that ln(l — x) < x for x € 
(0,1), we have that in 0 

k * 

m[x\y 3 ,a 3 ,j<k\< S - ln[27retr(E)] - ^ 9 

1 A j=1 Pk^k + cr- 

= | ln[27retr(E)] - ^ ? ^ 

(i-C)£ 2 i 

2 xl(p) Xj' 

On the other hand, if the true covariance matrix is used, the 
posterior entropy of the signal is given by 

1 s 2 / \ ^ 

Hideai [x, \yj,dj,j < k] = - ln[(27re) s A»] - ^ A* 

z=l £ 3=1 

(3) 


where = {xf l (p)/e 1 — l/\j)a 2 . Hence, we have 


tr(E) 


H[x| yj,dj,j <k]< 

§ 

Elideal [x, | Vj,aj,j <k} + - In 




n.7=i 


( 4 ) 


This upper bound has a nice interpretation: it characterizes 
the amount of uncertainty reduction with each measurement. 
For example, when the number of measurements required when 
using the assumed covariance matrix versus using the true 
covariance matrix are the same, we have \ > s 2 /Xni.P) am d 
A i > £ 2 /xl(p). Hence, the third term in (M) is upper bounded 
by —k/2, which means that the amount ofreduction in entropy 
is roughly 1/2 nat per measurement. 


Remark 3. Consider the special case where the errors only 
occur in the eigenvalues of the matrix but not in the eigenspace 
U, i.e. 

E — E = Udiag{e\, ■ ■ ■ , e s }C/ T 

and maxi<j< s |e»] = 5q, the upper bound in (0 can be further 
simplified. Suppose only the first K(K < s) largest eigenvalues 
of E are larger than the stopping criterion £ 2 /Xn(p) required 
by the precision, i.e., the algorithm takes K steps in total. Then 

H [x I yj , dj , j <k}< Hideai [x, | Vj,aj,j < k] 

+ Kln(l + ^|®l Jr ) 

+ ± i„d + E±E). 

j=K +1 3 

This characterizes the gap between the signal posterior entropy 
using the correct versus the incorrect covariance matrices after 
all measurements have been used. 


If we allow more total power and use a different power 
allocation scheme than what is prescribed in Algorithm |T] we 
are able to reach the desired precision e. The following theorem 
establishes an upper bound on the amount of extra total power 
needed to reach the same precision e (than the total power 
-Pideai if using the correct covariance matrix). 


Theorem 2. Given the recovery precision e, confidence level 
p, rank of the true covariance matrix rank(E) = s, assume 
K < s eigenvalues of E are larger than e 2 /Xn{p )• If 

E ~ E|1 " ^ +1 xUpV 


then to reach a precision £ at confidence level p, the total 
power Pmismatch required by Algorithm [ 7 ] when using E is 
upper bounded by 

Pmismatch < Pideai + S + ^-K}^^a 2 . 

51 272 £ z 

Remark 4. In a special case when I\ = s eigenvalues of E 
are larger than £ 2 /Xu(p)’ ^ len under the condition of Theorem 
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[2] we have a simpler expression for the upper bound 


P mismatch -*ideal H - 


323 x 2 n(p) _ 2 „ 
816 e 2 


Note that the additional power required is only linear in s, 
which is quite small. All other parameters are independent of 
the input matrix. 

Also, note that when there is a mismatch in the assumed 
covariance matrix, better performance can be achieved if we 
make many low power measurements than making one full 
power measurement because we update the assumed covariance 
matrix in between. 


B. Initialization with sample covariance matrix 

In practice, we usually use a sample covariance matrix for 
E. When the samples are Gaussian distributed, the sample 
covariance matrix follows a Wishart distribution. By finding 
the tail probability of the Wishart distribution, we are able to 
establish a lower bound on the number of samples to form the 
sample covariance matrix so that the conditions required by 
Theorem |T] are met with high probability and, hence. Algorithm 
[l] has good performance with the assumed matrix E. 

Corollary 1. Suppose the sample covariance matrix is obtained 
from training samples x\,...,x l that are drawn i.i.d. from 

A/-(0,E); 

~ 1 L 
s = f Y! £i£ i ■ 

i=l 

Let Sq = ||E — £||. When 

L> 4n 1 / 2 tr(S)(M+ 

t>0 Oq 

we have ||E — E|| < So with probability exceeding 1 — 
2 n exp(— \fn). 


C. Initialization with covariance sketching 

We may also use a covariance sketching scheme to form an 
estimate of the covariance matrix to initialize the algorithm, 
as illustrated in Fig. [2] Covariance sketching is based on 
sketches jj , j = 1,..., M, of the samples Xi, i = 1 ,... ,N 
drawn from the signal distribution. The sketches are formed by 
linearly projecting these samples via random sketching vectors 
bi, i = 1,..., M and then computing the average energy over L 
repetitions. The sketching can be shown to be a linear operator 
B applied on the original covariance matrix E, as demonstrated 
in Appendix [A] Then we may recover the original covariance 
matrix from these sketches 7 by solving the following convex 
program 

E = argmin A - tr(X) 

subject to X >7 0, H 7 — B(X)\\i < r, 

where r is a user parameter that specifies the noise level. 

We further establish conditions on the covariance sketching so 
that such an initialization for Info-Greedy Sensing is sufficient. 


[<V>+"J 



Fig. 2: Diagram of covariance sketching in our setting. The circle 
aggregates quadratic sketches from branches and computes the average. 


Theorem 3. Assume the setup of covariance sketching as 
above. Then with probability exceeding 1 — 2 /n — 2/y/n — 
2nexp(— y/n)) — exp(—CoCins), the solution to |5| satisfies 

l|S-E|| < Jo, 

for some So > 0, as long as for some constant c > 0 the 
parameters M, N, L, and r are chosen such that 

M = cns > cons, 

t /2 ,36c 2 n 4 s 2 ||E|| 24cn 2 s 

N > 4n 1/2 tr E - 11 +- , 

r z r 


L > max{- 


cs 


4n.11S11 ’ ^/2tr(E)||E||csn 3 

r = cnsSo/C' 2 - 

Here cq, Ci, Ci, and C 2 are absolute constants. 


2 6cns 2 
>- a }> 


IV. Numerical example 

When the assumed covariance matrix for the signal x is equal 
to its true covariance matrix, Info-Greedy Sensing is identical 
to the batch method (T9) (the batch method measures using the 
largest eigenvectors of the signal covariance matrix). However, 
when there is a mismatch between the two, Info-Greedy Sensing 
outperforms the batch method due to its adaptivity, as shown 
by the example demonstrated in Fig. [3] Info-Greedy Sensing 
also outperforms the sensing algorithm where a* are chosen to 
be random Gaussian vectors with the same power allocation, 
as it uses prior knowledge (albeit being imprecise) about the 
signal distribution. 


V. Discussion 

In high-dimensional problems, a commonly used low¬ 
dimensional signal model for x is to assume the signal lies 
in a subspace plus Gaussian noise, which corresponds to the 
case we considered in this paper where the signal covariance 
is low-rank. A more general model is the Gaussian mixture 
model (GMM), which can be viewed as a model for the signal 
lying in a union of multiple subspaces plus Gaussian noise, and 
it has been widely used in image and video analysis among 
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Fig. 3: Sensing a low-rank Gaussian signal of dimension n = 500 
and about 5% of the eigenvalues are non-zero, when there is mismatch 
between the assumed covariance matrix and true covariance matrix: 

assumed = .true + ee T , where e ~ A/"(0,/), and Using 

20 measurements. The batch method measures using the largest 
eigenvectors of S l assumed , and the Info-Greedy Sensing updates 
Sx,assumed in the algorithm. Info-Greedy Sensing is more robust to 
mismatch than the batch method. 


others. Our analysis for a low-rank Gaussian signal can be easily 
extended to an analysis of a low-rank Gaussian mixture model 
(GMM). Such results for GMM are quite general and can be 
used for an arbitrary signal distribution. In fact, parameterizing 
via low-rank GMMs is a popular way to approximate complex 
densities for high-dimensional data. Hence, we may be able 
to couple the results for Info-Greedy Sensing of GMM with 
the recently developed methods of scalable multi-scale density 
estimation based on empirical Bayes ]20| to create powerful 
tools for information guided sensing for a general signal model. 
We may also be able to obtain performance guarantees using 
multiplicative weight update techniques together with the error 
bounds in ( 20 ] . 
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Appendix A 

Covariance sketching 

We consider the following setup for covariance sketching. 
Suppose we are able to form measurement in the form of 
y = a J x+w like we have in the Info-Greedy Sensing algorithm. 
Suppose there are N copies of Gaussian signal we would like 
to sketch: x\,..., Xn that are i.i.d. sampled from A/”(0, £), and 
we sketch using M random vectors: 61 ,..., 6 m- Then for each 
fixed sketching vector bi, and fixed copy of the signal Xj, we 
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acquire L noisy realizations of the projection result y, :i i via 
Uijl =bjxj +Wiji, l=l,...,L. 


We choose the random sampling vectors bi as i.i.d. Gaussian 
with zero mean and covariance matrix equal to an identity 
matrix. Then we average ijiy over all realizations l = 1,..., L 
to form the *th sketch y t;j for a single copy xy. 



Wij 


The average is introduced to suppress measurement noise, which 
can be viewed as a generalization of sketching using just one 
sample. Denote = j w iju which is distributed as 
Af(0, o 2 /L). Then we will use average energy of the sketches 
as our data 7 ,;, i = 1 ...., M, for covariance recovery: 


1 N 

A 1 


3 =1 

Note that 7 $ can be further expanded as 

N 


N 


^ 2 J— 1 

7i = tr(E N bfiJ) + “w 6 * *3 + 

j =1 j=i 


( 6 ) 


where 



N 


j=i 


is the maximum likelihood estimate of E (and is also unbiased). 
We can write in vector matrix notation as follows. Let 
7 = [ 71 , • • • 7 m] t - Define a linear operator B : R" xn i-> R M 
such that [B{X)]i = tr(XbibJ). Thus, we can write (j 6 jl as a 
linear measurement of the true covariance matrix E 


7 = S(E) +77, 

where p € R A/ contains all the error terms and corresponds to 
the noise in our covariance sketching measurements, with the 
ith entry given by 

2 N 1 N 

Vi = bj (E AT - E )bi + w H b iXj + Jf w iy 
3=1 o =1 

Note that we can further bound the £i norm of the error term 
as 


M M 

Wvh = ~ SII6 + 2EW +Wl 

i=1 i=l 

where 

M 

b = ||^i|| 2 j E[ 6 ] = Mn, Var[ 6 ] = 2 Mn, 

i=l 

2 ^ ^ 2Ma 4 

w = w iy E M = Ma 2 /L , and Var[w] = 

»=i j=i 


AT 


= -V 

TV ^ 


Wijbjxj, 


j=i 


E[^j] = 0 and Var[zi] = 


a 2 tr(E) 
NL 


We may recover the true covariance matrix from the sketches 
7 using the convex optimization problem (|5j. 


Appendix B 
Backgrounds 


Lemma 1. [21J Let E, E G M. pxp be symmetric,with 
eigenvalues Ai > • • • > X p and Ai > • • • > X p respectively. 


E = E — E has eigenvalues e± > ■ ■ ■ > e p . Then for each 

i e {!,■■■ ,p}> 


Xi G [Aj + e p , A i + ei]. 


Lemma 2. j |22^ Denote A : R nx ” 1 —» R m a linear operator 
and for X G M" xn , .A (A”) = {afXai}^!.^ Suppose the mea¬ 
surement is contaminated by noise p G R m , i.e. Y = .4(E) + p 
and assume ||t7||i 
1 — exp(— C\m) the 
satisfies 

||S — E|| f < Ci ~ Er|1 * + C 2 —, 
yr to 

for all E G R nxn , provided that m > c^nr. Cq, c±, C\ and 
C2 are absolute constants and E r represents the best rank-r 
approximation of E. When E r is exactly rank-r 

||E-E|| F <C 2 ~. 

TO 


< ei. Then with probability exceedin 
solution E to the trace minimization (|5| 


Lemma 3. 


1231 If xe 


W n {N, E), then for t > 0, 


P{||^A - E|| > (^^71 + ^)||E||} < 2nexp(—f), 
where 9 = tr(E)/||E||. 


Appendix C 
Proofs 
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Lemma 4 . Suppose the power of measurement in the kth step 
is /3 k . //4-i < 3 ct 2 /4/3fc, 4 < 44 - 1 - 









Proof: Let A k = a k a \, and ||A fc || = (3 k , 

rp 77, Xl k —iCL k (lJ,Yi k —i X k (l k QpX k _\ 

r k — •E'fc-1 H- j --;-Z---— 

a fc-i E fc-i a k + a 2 fi k X k +a 2 

<■ , r , PkX k a k E k _ia k 

Ok — Ok —1 i --- 7 - 

(/3fcA fc + a 2 )(/3 k X k + cr 2 - a J k E k -ia) 
1 

+ 


Proof: Let A k = a k a\. 

E k = E k _ i + X~ k A k 

a k E k -ia k 


■ ||-4fcSfc-i| 


PkX k + tr 2 - alE k _ia k 


[Afc(||^4fc-Efc-i||] + 11-Efc-i-AfcH) + \\E k -iA k E k -i\ 


< S k -i + 


+ 


Pk x t s k-i 


(/3fcAfc + a 2 )(/3 k X k + cr 2 - Pkfa-i) 


Pk 


PkX k T a- (5 k 8 k —i 


[2Afc<5fc_i + 


<(i + 3 ,, )4- 


+ 


Pk^k + Pk^k—1 

Pk c2 

°k- 1 - 


PkX k T CT 2 P k S k — 1 

Q 2 

Now that <5fc_i < we have d*, < 4<5fe_i. 


Lemma 5. Consider positive semi-definite matrix X G 
/or h G R n , if 

1 


y = x- 


ht Xh + cr 2 


Xhh J X, 


we have 


rank(X) = rank(F). 


Proof: 

Apparently, Vx G ker(X), Fa; = 0, i.e. 

ker(X) C ker(F). 

Apply a decomposition for the positive semi-definite matrix 
X = Q J Q. For Mx G ker(F), let b = Qh , z = Qx. If b = 0, 
Y = X\ otherwise, when b ^ 0, we have 

o = .'y. = *-/L, 

btb + a 2 


Thus, 


zWz b^b 

< 


lA b + a 2 lAb + a 2 


z' z. 


Therefore z = 0, i.e. x G ker(X), ker(F) C ker(X). This 
shows that ker(X) = ker(F), which leads to rank(X) = 
rank(F). ■ 

Lemma 6 . If S k - 1 < A/,, the true conditional covariance 
matrix E k of the signal x conditioned upon the measurements 
yi,... ,y k is related to the previous iteration as follows: 


tr(Efc) < tr(E fc _i) - 


PkX k ( 3fi k \ k 8 k —i 


PkX k + a 2 p k X k +cr 2 - fi k S k - 1 


(PkXk + cr 2 )(p k X k + a 2 - a J k E k _ia k ) 
X k 


PkX k Act 2 - a\E k -ia k 

1 

PkXk + cr 2 - a\E k -ia k 


(A k E k _ i + E k _iA k ) 


E k _\A k E k _ i . 


Note that rank(Afc) = 1, thus rank (A k E k ^\) < 1, therefore it 
has at most one nonzero eigenvalue, 

|tr(A fe E fc _i)| = \tv{E k -iA k )\ 

= H-^fc-Efe-tll < H-^fcllll-E'fc-lll = Pk^k-1- 

Note that E k _ \ is symmetric and A k is positive semi-definite, 
we have tr (E k _xA k E k _ k ) > 0. Hence, 

tr(E k ) = tr(E k ) - tr(E k ) 

. , 3p k X k (p k X k + ^-)S k -i 

> tr(E fc _i) - 


Therefore, 


tr(E fc ) < tr(Efc-i) - 


(PkX k + cr 2 ){P k X k + cr 2 - 0 k 6k-i) 

PkX k ^ 3f3 k X k 8 k —i 
PkXk+cr 2 PkXk + CT 2 - Pk 8 k -1 


Lemma 7. Denote 9 = tr(E)/||E|| > 1, rank(E) = s, M = 
cns, if 

1/2 ,36c 2 n 4 s 2 ||E|| 24cn 2 s. 

N > 4?r 1/2 tr(E)(- . " " +-) 

t A t 


and 


cs 


L > max{————o' 


2 6ms 2 
cr ,- a }, 


4n||E|| ’ y / 2tr(E)||S||csn 3 ’ r 

then with probability exceeding 1 — 2/n—2/^/n—2 nexp(— yfin) 
we have 1177111 < r. 


Proof: From Chebyshev’s inequality, we have that 
r 36M 2 cr 2 tr(E) 

< 5m> s 1 — NieP 1 ’ 

r cr t 72cr 4 M 

P{w < M- + -} > 1 - 


and 


P{b < (M + VM)n} >1-. 


Let = r/[3n(M + s/M)\. When 

,7 1 /o ,„ w 36n 2 M 2 ||E| 24 nM N 

N > 4n 1 / 2 tr(E)(-—+-) 

T 2 T 
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with with Lemma [3] we have 

P{||Ejv-E|| <5 S } 

12n 1 / 2 (6 + 1 ) 26n 1 / 2 


>P{||E JV -E||<( 
>1 — 2 nexp(— \/n) 


N 


+ 


N 


Ml 


when 


L > max{ 


cs 


M\^\\ a ’ \/2tr(E)||E||csn 3 " 


(Sens 


<r 2 }, 


we have 


< 6 Af* ~ 1 My/n' 

P{W< \ 

P{\b\ < (M + \fM)n} > 1 — —. 

n 

Therefore, || 77 H 1 < r holds with probability at least 1 — 2/n — 
2/y/n — 2 nexp(— y/n). ■ 

Proof of Theorem [7} Recall that for k = 1 
\k > e 2 /Xn(p)- With Lemma |d] provided in the Appendix, 
we can show that for some constant 0 < ( < 1 , if <5o < 
£e 2 /(4 K+1 Xn(p))> f° r first K measurements, 

^ 1 (e 2 ^ 1 3cr 2 

° k ^ 4^4x!(p) - k = 

By applying the result in Lemma [ 6 ] we have 

tr(S fc ) < tr(Efc_r) - (1 - Q f fcAfc A fc 

PfeAfc + cr- 


< tr(E fc _i) - (1 - C) 


/3 k Xk tr(E fc i) 


Recall that 


fk = 1 - 


/3fcA fc + a 2 

1 — C PkX k 


we have 
Subsequently, 


s /3 k X k + o 2 ' 
tr(E fe ) < f k tr(T, k _i). 


where (1) follows from the Hadamard’s inequality and (2) 
follows from the mean inequality. Finally, we can bound the 
conditional entropy of the signal as 

B.[x\yj,aj,j < k] = ln(27re) s/2 Vol(E fc ) 

k 

< ^ln{2 7 re(]J/ j )tr(Eo)}. 

3 =1 


Proof of CorollaryU^ Let 9 = tr(E)/||E|| > 1. We have 
that for some constant do> 0 , when 

L>4n 1 / 2 tr(E)(M + 4 ); 

Oq "0 

with Lemma [3] we have 

P{||E-E||<(5o} 

>P{||£-E||<(/^f±L + 2 ^)||E||} 

> 1 — 2 ?rexp(— y/n). 


Proof of Theorem^ Recall that rank(E) = s, A s+ i(E) = 

• • • = A„ (E) = 0. Notice that for each step of iteration, the 
the eigenvalue of Efc in the direction of a k , which corresponds 
to the largest eigenvalue of Efc, is eliminated below threshold. 
Therefore, as long as the sequential algorithm continues, the 
largest eigenvalue of E/,. is exactly the (k + l)th largest 
eigenvalue of E. Now that <5 0 < 4 J. with Lemma jlj 

and Lemma [4j 

I A* - A fc (E)| < S 0 , for k = 1,..., s, 


|Aj| < < 



for k = s + 1 ,..., n. 


Notice that in the ideal case with no perturbation, the aim of 
each measurement is to decrease the eigenvalue of a particular 
direction to £ 2 /Xu(p)- Suppose in the ideal scenario, the 
algorithm stops after K < s steps of iteration. Hence, 


k 

tr(Efc) < (Q /f)tr(S 0 ). 
i=i 

Lemma [5] shows that the rank of the covariance will not 
be changed by updating the covariance matrix sequentially: 
rank(Ei) = ••• = rank(Efc) = s. Hence, we may decom¬ 
pose the covariance matrix Efc = QQ 1 , with Q £ R" xs 
being a full-rank matrix, then Vol(E/ ;: ) = det(Q T Q). Since 
tr(Q T Q) = tr(QQ T ), we have 

(!) S 

Vo I 2 (Efc) = det(Q T < 3 ) < l[(Q T Q)n 

3 =1 

^ ^ tr(Q T Q) = , tr(S k ) 


Ai(E) > • • • > Aa'(S) > 


xKpV 


A S (E) < • • • < A/f+i(E) < 


xUp)' 

Therefore, the power needed in the ideal case is 


K 


-Pideal — ^ 


xlip) 


1 


k =1 


Afc(E) 


)a 2 . 


In the noisy case, for the first K steps of measurements, 1 < 
k < K, we choose the power 


Pk = cr 2 (- 


1 


xl ( V) 


-j-h- 

Ak 


8 




























We have 


/3fc-iAfc_i + a 2 
For the steps K + 1 < k < s. 


Afc-i = 


xlip) 


- 6 .. 


Pk = max{0, er 2 (- 


1 


xl(p) 


-r-f» 

— Os A/c 


<^ 2 (- 


1 


1 




) 


< a 


xlip) Ws xl (p) ' 0 
2 (4 s + l)(5o 


< - fl \ 


With Lemma |T] all eigenvalues of T K are no greater than 

— (5 S + Ai (E s ) = 


xlip) 

And the total power 

S 

P mismatch — E p k 


xlip)' 


k =1 


K 




k = l Xn(P) 


A k 


20(s - K) x 2 n {p) 

51 £ 2 


}■ 


In order to achieve precision e and confidence level p, the extra 
power needed is upper bounded as 

-^mismatch -f^ideal 

^ 2 rWt 1 Xn(p) , * 1 ^ , 20 {s-K)xl(p)^ 

^3^ +6o ^>+ -^- 


k =1 


<^ 2 { 


1 £ 2 ^ 1 , 20s-3Kxl(p ) 


4 S+1 Xlip) S A 2 


51 £- 

2 , 


51 




xlip) 


Proof of Theorem U\ Let 9 = tr(E)/||E|| > 1. With 
Lemma [7] let r = MSq/C- 2 , the choice of M,N, and L ensures 
that H 77 H! < MS 0 /C 2 with probability at least l—2/n—2/x/n— 
2n exp(— y/n)). By applying Lemma [ 2 ] and noting that the rank 
of E is exactly s, we have 

||S-S|| F < J 0 . 

Therefore, with probability exceeding 1 — 2 /n — 2/ yfn — 

2 nexp(— y/n)) — exp(— CgCins), 

||E-E|| < ||E-E|| F <<y 0 . 
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